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The analysis of the joint cumulative distribution function (CDF) 
with bivariate event time data is a challenging problem both theoreti- 
cally and numerically. This paper develops a tensor spline-based sieve 
maximum likelihood estimation method to estimate the joint CDF 
with bivariate current status data. The /-splines are used to approxi- 
mate the joint CDF in order to simplify the numerical computation of 
a constrained maximum likelihood estimation problem. The general- 
ized gradient projection algorithm is used to compute the constrained 
optimization problem. Based on the properties of B-spline basis func- 
tions it is shown that the proposed tensor spline-based nonparamet- 
ric sieve maximum likelihood estimator is consistent with a rate of 
convergence potentially better than 7i 1//3 under some mild regularity 
conditions. The simulation studies with moderate sample sizes are 
carried out to demonstrate that the finite sample performance of the 
proposed estimator is generally satisfactory. 

1. Introduction. In some applications, observation of random event 
time T is restricted to the knowledge of whether or not T exceeds a random 
monitoring time C. This type of data is known as current status data and 
sometimes referred to as interval censoring case 1 [Groeneboom and Well- 
ner (1992)]. Current status data arise naturally in many applications; see, 
for example, the animal tumorigenicity experiments by Hoel and Walburg 
(1972) and Finkelstein and Wolfe (1985); the social demographic studies of 
the distribution of the age at weaning by Diamond, McDonald and Shah 
(1986), Diamond and McDonald (1991) and Grummer-Strawn (1993); and 
the studies of human immunodeficiency virus (HIV) and acquired immunod- 
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eficiency syndrome (AIDS) by Shiboski and Jewell (1992) and Jewell, Malani 
and Vittinghoff (1994). 

The univariate current status data have been thoroughly studied in the 
statistical literature. Groeneboom and Wellner (1992) and Huang and Well- 
ner (1995) studied the asymptotic properties of the nonparametrc maximum 
likelihood estimator (NPMLE) of the CDF with current status data. Huang 
(1996) considered Cox proportional hazards model with current status data 
and showed that the maximum likelihood estimator (MLE) of the regression 
parameter is asymptotically normal with yfn convergence rate, even through 
the MLE of the baseline cumulative hazard function only converges at n 1 / 3 
rate. 

Bivariate event time data occur in many applications as well. For ex- 
ample, in an Australian twin study [Duffy, Martin and Matthews (1990)], 
the researchers were interested in times to a certain event such as a dis- 
ease or a disease-related symptom in both twins. NPMLE of the joint CDF 
of the correlated event times with bivariate right censored data was stud- 
ied by Dabrowska (1988), Prentice and Cai (1992), Pruitt (1991), van der 
Laan (1996) and Quale, van der Laan and Robins (2006). As an alterna- 
tive, Kooperberg (1998) developed a tensor spline estimation of the log- 
arithm of joint density function with bivariate right censored data. How- 
ever, asymptotic properties of Kooperberg's estimate are unknown. Shih 
and Louis (1995) proposed a two-stage semiparametric estimation proce- 
dure to study the joint CDF with bivariate right censored data, in which 
the joint distribution of the two event times is assumed to follow a bivariate 
Copula model [Nelsen (2006)]. 

For bivariate interval censored data, the conventional NPMLE was origi- 
nally studied by Betensky and Finkelstein (1999) and followed by Wong and 
Yu (1999), Gentleman and Vandal (2001), Song (2001) and Maathuis (2005). 
A typical numerical algorithm for computing the NPMLE constitutes two 
steps [Song (2001) and Maathuis (2005)]: in the first stage the algorithm 
searches for small rectangles with nonzero probability mass; in the second 
stage those nonzero probability masses are estimated by maximizing the log 
likelihood with a reduced number of unknown quantities. Sun, Wang and 
Sun (2006) and Wu and Gao (2011) adopted the same idea used by Shih 
and Louis (1995) to study the joint distribution of CDF for bivariate interval 
censored data with Copula models. 

This paper studies bivariate current status data, a special type of bivariate 
interval censored data. Let (Ti,T2) be the two event times of interest and 
{C\,C%) the two corresponding random monitoring times. In this setting, 
the observation of bivariate current status data consists of 



(1.1) 



X = (Ci,C 2 , Ai = I(Ti < Ci), A 2 = I(T 2 < C 2 )), 
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where /(■) is the indicator function. For bivariate current status data, Wang 
and Ding (2000) adopted the same approach proposed by Shih and Louis 
(1995) to study the association between the onset times of hypertension 
and diabetes for Taiwanese in a demographic screening study. In a study 
on HIV transmission, Jewell, van der Laan and Lei (2005) investigated the 
relationship between the time to HIV infection to the other partner and 
the time to diagnosis of AIDS for the index case by studying some smooth 
functionals of the marginal CDFs. In both examples, the bivariate event 
times have the same monitoring time, that is, C\ = C 2 = C. In this paper, 
we propose a tensor spline-based sieve maximum likelihood estimation of the 
joint CDF for bivariate current status data in a general scenario in which C\ 
and C 2 are allowed to be different. The proposed method is shown to have 
a rate of convergence potentially better than ra 1 / 3 and it can simultaneously 
estimate the two marginal CDFs along with the joint CDF. 

The rest of the paper is organized as follows. Section 2 characterizes the 
spline-based sieve MLE f n = {F n ,F nt i,F n ^ 2 ), where F n is the tensor spline- 
based estimator of the joint CDF, and F n ^ and F n ^ 2 are the spline-based 
estimators of the two corresponding marginal CDFs. Section 3 presents two 
asymptotic properties (consistency and convergence rate) of the proposed 
spline-based sieve MLE. Section 4 discusses the computation of the spline- 
based estimators. Section 5 carries out a set of simulation studies to examine 
the finite sample performance of the proposed method and compares it to the 
conventional NMPLE computed with the algorithm proposed by Maathuis 
(2005). Section 6 summarizes our findings and discusses some related prob- 
lems. Section 7 provides proofs of the theorems stated in the early section. 
Details of some technical lemmas that are used for proving the theorem and 
their proofs are included in a supplementary file. 

2. Tensor spline-based sieve maximum likelihood estimation method. 

2.1. Maximum likelihood estimation. We consider a sample of n i.i.d. bi- 
variate current status data denoted in (1.1), {(ci^, c 2tk , <5i,fc, o~2,k) : fe = 1, 2, . . . , 
n}. Suppose that (Ti,T2) and (Ci,C2) are independent. Then the log-likeli- 
hood for the observed data can be expressed by 

n 

Z n (-;data) = ^2{5i, k 5 2 , k logP(T 1 <c ljfe ,T 2 < c 2)fe ) 
k=i 

+ <Wl -<5 2 ,fc)logP(Ti <ci ifc ,T 2 >c 2 ,k) 

(2.1) 

+ (1 - 5i,fc)<5 2 ,fclogP(Ti > ci )fc ,T 2 < c 2 ,fc) 

+ (1 - «5 ljfc )(l - 5 2tk ) logP(Ti > c ljfc ,T 2 > c 2ife )}. 
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Denote F the joint CDF of event times (T±,T 2 ) and F\ and F 2 the 
marginal CDFs of F, respectively. The log-likelihood (2.1) can be rewrit- 
ten as 

n 

Z n (F,Fi,F 2 ;data) = y~]{6 ltk 8 2 ,k log-F(ci,fc, c 2jk ) 
fe=i 

(2.2) + S 1)k (l - S 2jk ) log^c^) - F(c hk ,c 2 , k )) 

+ (1 ~ S ljk )S 2tk log(F 2 (c 2jk ) - F(c hk ,c 2 , k )) 

+ (1 - * 1)fc )(l - S 2jk ) log(l - Fi(c 1)fc ) - F 2 (c 2 , fc ) 

+ F(ci >fc ,C2,fc))}. 

A class of real-valued functions defined in a bounded region [Li,f/i] x 
[L 2 , U 2 ] is denoted by 

T={(F(s,t),F 1 (s),F 2 (t)): for (s, t) £ [Li, U{\ x [L 2 ,[/ 2 ]}, 

where i 7 , F\ and F 2 satisfy the following conditions in (2.3): 

0<F(s,t), 

F(s',t)<F(s",t), 

F(s,t')<F(s,t"), 

[F(s", t") - F(s', t")] - l(F(s", - F(s', t')] > 0, 

(2.3) F 1 (s)-F(s,t)>0, 

F 2 (t)-F{s,t)>0, 

[F 1 (s")-F 1 (s')]-[F(s",t)-F(s',t)]>0, 

[F 2 (t")-F 2 (t')]-[F(s,t")-F(s,t')}>0, 

[l- Fl (s)]-[F 2 (t)-F(s,t)]>0 

for s' < s" with s' and s" on [L x ,Ui\, and t' < £" with t' and t" on [L 2 , U 2 ). 

It can be easily argued that if F is a joint CDF and F\ and i^ 2 are its 
two corresponding marginal CDFs, (F,F\,F 2 ) £ J 7 . On the other hand, for 
any (F,Fi,F 2 ) £ there exists a bivariate distribution such that F is the 
joint CDF and F\ and F 2 are its two marginal CDFs. Throughout this 
paper, ^0,-^0,1 and Fq 2 are denoted for the true joint and marginal CDFs, 
respectively. The NPMLE for (Fq, Fq^i, Fq ;2 ) is defined as 

(2.4) (F n ,F ntl ,F n>2 ) = argmax l n (F,F l ,F 2 ; data). 

The NPNLE of (2.4) is, in general, a challenging problem both numerically 
and theoretically. The conventional NPMLE of F is constructed by a larger 
number of unknown quantities representing the masses in small rectangles. 
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Solving for the NPMLE needs to perform a constrained high-dimensional 
nonlinear optimization [Betensky and Finkelstein (1999), Wong and Yu 
(1999), Gentleman and Vandal (2001), Song (2001), Maathuis (2005)]. Though 
the conventional NPMLE of (2.4) can be efficiently computed using the al- 
gorithm developed by Maathuis (2005), it is, however, well known that the 
conventional NPMLE is not uniquely determined [Song (2001), Maathuis 
(2005)]. In an unpublished Ph.D. dissertation, Song (2001) showed that the 
conventional NPMLE of joint CDF with bivariate current status data can 
achieve a global rate of convergence of re 3 / 10 in Hellinger distance, which is 
slightly slower than that of the NPMLE with univariate current status data. 

This paper adopts a popular dimension reduction method through spline- 
based sieve maximum likelihood estimation. The main idea of the spline- 
based sieve method is to solve problem (2.4) in a subclass of T that "ap- 
proximates" to T when sample size enlarges. The advantages of the pro- 
posed method are that the spline-based sieve MLE is unique, and it is easy 
to compute and analyze. The univariate spline-based sieve MLEs for various 
models were studied by Shen (1998), Lu, Zhang and Huang (2007, 2009), 
Zhang, Hua and Huang (2010) and Lu (2010). Other problems related to 
applications of univariate shape-constrained spline estimations have recently 
been studied as well. For example, Meyer (2008) studied the inference us- 
ing shape-restricted regression spline functions and Wang and Shen (2010) 
studied S-spline approximation for a monotone univariate regression func- 
tion based on grouped data. For analyzing bivariate distributions, the tensor 
spline approach [de Boor (2001)] has been studied by Stone (1994) in a non- 
parametric regression setting, by Koo (1996) and Scott (1992) in a multivari- 
ate density estimation without censored data and, as noted in Section 1, by 
Kooperberg (1998) in the bivariate density estimation with bivariate right 
censored data. Recently, an application of the tensor £>-spline estimation 
of a bivariate monotone function has also been investigated by Wang and 
Taylor (2004) in a biomedical study. 

In this paper, we propose a partially monotone tensor spline estimation 
of the joint CDF. To solve problem (2.4), the unknown joint CDF is approx- 
imated by a linear combination of the tensor spline basis functions, and its 
two marginal CDFs are approximated by linear combinations of spline basis 
functions as well. Then the problem converts to maximizing the sieve log 
likelihood with respect to the unknown spline coefficients subject to a set of 
inequality constraints. 

2.2. B- spline- based estimation. In this section, the spline-based sieve 
maximum likelihood estimation problem is reformulated as a constrained 
optimization problem with respect to the coefficients of i?-spline functions. 

Suppose two sets of the normalized i?-spline basis functions of order I 
[Schumaker (1981)], {N^' 1 (s)}^ and {N<j 2) ' 1 (t)}^ are constructed in 
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[Li,£/i] x [L2,U2] with the knot sequence satisfying L\ = u\ = 

■■■ = ui < ui+i < ■■■ < u Pn < u Pn+ \ = ■■■ = u Pn+ i = U\ and the knot se- 
quence {vj}^ 1 satisfying L 2 = v x = ■ ■ ■ = v i < v i+1 <■■■ <v qn < v qn+1 = 
• • • = v qn+ i = U2, where p n = 0(n v ) and q n = 0{n v ) for some < v < 1. 



fin =\r n = (F n ,F n>1 ,F nt2 ):F n {s,t) = ^ ^ a 4J ivf y (s)N^ 1 (t), 



with a = («!,!,..., a Pn!qn ), §_= (fa, . . . , @ Pn ) and 7 = (71, . . . ,7 9 J subject to 
the following conditions in (2.5): 

"1,1 > 0, 

aij+l - aij > for j = 1, . . . ,q n - 1, 
(2.5) - > for i = l, ...,p n -l, 

(Oii+lJ+l - a i+l,j) - («i,j+l ~ a i,j) > 

for i = l,...,p n - l,j = l,...,q n ~ 1, 

A - ai, g „ > 0, 71 - a Pn ,i > 0, 

(A+l - fa) ~ (oi+l,g„ - «i,?„) > for z = 1, . . . ,p n - 1, 
(7j+i - 7j) - (opnj+i - a Pn,j) > for j = 1, . . . , g n - 1, 

(2.5) is established corresponding to the constraints given in (2.3). Using the 
properties of S-spline, a straightforward algebra yields Q, n C T . To obtain 
the tensor U-spline-based sieve likelihood with bivariate current status data, 
r n = (F n ,F n ^,F n ^) 6 fi n is substituted into (2.2) to result in 

l n (0L->fal'i data) 



Define 



Pn 



i=i j=l 
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(2.6) + (1 - 5 lik )5 2 ,klog< £t#] 2) 'W) 

Pn <?n 

-EE^ (1) *V*W 2), W) 
i=i j=i 

f Pn 

r(l),i/ 



+ (1 - ff lifc )(l - 5 2 , k )logl 1 - E W fa,*) 

I i=i 

-E^ 2), W) 



Pn <Jn 



i=l j=l J J 

Hence, the proposed sieve MLE with the i?-spline basis functions is the 
maximizer of (2.6) over £l n . 

Remark 2.1. The spline-based sieve MLE in Q n is the MLE defined in 
a sub-class of T . Hence, the spline-based sieve MLE is anticipated to have 
good asymptotic properties if this sub-class "approximates" to T as n — > oo . 

3. Asymptotic properties. In this section, we describe asymptotic prop- 
erties of the tensor spline-based sieve MLE of joint CDF with bivariate cur- 
rent status data. Study of the asymptotic properties of the proposed sieve 
estimator requires some regularity conditions, regarding the event times, ob- 
servation times and the choice of knot sequences. The following conditions 
sufficiently guarantee the results in the forthcoming theorems. 

Regularity conditions: 

(CI) Both 9Fo qs'^ and ^-^^ have positive lower bounds in [Li,£7i] x 
[L 2 ,U 2 ]. _ 

(C2) - q°q^ has a positive lower bound &o in [Li,{7i] x [L 2 ,U 2 ]. 

(C3) Fo(s,t),Fo > i(s) and Fo ;2 (t) are all continuous differentiable up to 
order p in domain [Li,£7i] x [L 2 ,U 2 ], [Li,Ui] and [L 2 ,U 2 ], respectively. 

(C4) The observation times (Ci,C 2 ) follow a bivariate distribution de- 
fined in [Zi,"Ui] x ^2,^2]) with l\ > L\,u\ < U\, l 2 > L 2 and u 2 <U 2 . 

(C5) The density of the joint distribution of (Ci, C 2 ) has a positive lower 
bound in [Zi,ui] x [£2,^2]- 

(C6) The knot sequences {uj}^™^ and of the S-spline basis func- 

W.hPn rATVlhm „„«„a, + W WV, min, , min, A^> 



tions, {Nr hl }?\ and {Nf hl }f_„ satisfy that both H \ u) and 

1 Jl - i L 3 '3-^ J max, A ( ' 



max, A. max, A 

3 3 
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(u) 

have positive lower bounds, where A 4 = Uj+i — Ui for i = I, . . . ,p n and 
Aj = vj+i - Vj for j = /,... , q n . 

Remark 3.1. (CI) implies that dF °^ s>> and dF °^ have positive lower 
bounds on [Li,[/i] and [L2, U2], respectively. (C3) implies that both dFo gg'^ 
and 9F0 Qt'^ have positive upper bounds in [Li,f7i] x [L 2 , U2]', dF °^ S ^ and 
dF °^ — have positive upper bounds on [Li,Ui] and [L2, U2], respectively. 

Let 

{Pn qn 
r = (F n ,F nil ,F n> 2):F n {s,t) = ^^ayiV^s)^ 2 ^), 
i=i j=i 

Pn qn ~\ 

F n,l{s) =J2^NP'\ S ),F n ,2(t) =^2^ >\t) , 
i=l 0=1 J 

with a= (ai > i,...,a Pntqn ), §_ = (fa, . . . , (3 p J and 7 = (71, . . . ,j qn ) subject to 
the following conditions in (3.1): 

"1,1 > 0, 

aij+i - aij > for j = 1, . . . ,q n - 1, 
af+1,1 - > for i = 1, . . . ,p n - 1, 
- oti+ij) - (aij+i - ajj) 

for i = l,...,p n - l,j = l,...,qr n - 1, 

A - ai, 9n > 0, 71 - a P „,i > 0, 

(A+l - Pi) ~ («i+l,g„ - «i,?„) > for i = 1, . . . ,p n - 1, 
(7j+i - 7j) - («p„ j+i - a P n,i) > for j = 1, . . . , 9n - 1, 

fip„ + lq n ~ a Pn,qn — 1- 

Remark 3.2. Note that S7 n> i is a sub-class of O n due to the change from 
the forth inequality of (2.5) to that of (3.1). The choice of Q n ,i is mainly for 
the technical convenience in justifying the asymptotic properties. In the forth 
inequality of (3.1), bo is the positive lower bound of 9 Qsdt^ stated in (C2). 

This inequality will guarantee that 9 Qsdt ^ a ^ so nas a positive lower bound 
which is necessary for the proof of Lemma 0.1 in the supplemental article 
[Wu and Zhang (2012)]. It is obvious that as sample n increases to infinity, 
the right-hand side of the forth inequality in (3.1) will approach to 0. 



(3.1) 
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We study the asymptotic properties in the feasible region of the ob- 
servation times: [Zi,iti] x [Z2, ^2] ■ Let £l' n = {t G fl n ,i, for (s,t) G x 
[h,u 2 ]} and let r = (Fo(s,t),Fo,i(s),Fo, 2 (*)) with (s,t) £ [h,^] x [h,u 2 ]. 
Under (C4), the maximization of l n (a, (3, 7; data) over Q, n i is actually the 
maximization of l n (a, /3, 7; data) over Q' n . Throughout the study of asymp- 
totic properties, we denote f n the maximizer of l n (a, /3, 7; data) over Cl' n . 

Denote L r (Q) the norm associated with a probability measure Q which 
is defined as 

II/IImq) = (Q\f\ r ) 1/r = (/ l/r^)^ 

In the following, L r (Pd ,c 2 ) 1 ^r(-Fb x ) and L r (Pc 2 ) are denoted as the L r - 
norms associated with the joint and marginal probability measures of the 
observation times (Ci,C?2), respectively, and L r (P) is denoted as the L r - 
norm associated with the joint probability measure P of observation and 
event times (Ti , T 2 , Ci, C 2 ) . 

Based on the L 2 -norms, the distance between r n = (F n , F n< i, F n ,2) £ 
and ro = (Fq, Fq^i, Fq^) is defined as 

d(T n ,T ) 




Theorem 3.1. Suppose (C2)-(C6) hold, and Pn = 0{n v ), q n = 0{n v ) 
for v < 1; that is, the numbers of interior knots of knot sequences {i£i}f n+ ' 
and {vj}f n+l are both in the order of n v for v < 1. Then 

d(f n ,To)^ p O as n— > 00. 

Theorem 3.2. Suppose (C1)-(C6) ftoW, and Pn = 0{n v ), q n = 0{n v ) 
for v < 1; that is, the numbers of interior knots of knot sequences {ui}i" +l 
and {vj}\ n+l are both in the order of n v for v < 1. Then 

d(f n ,T ) = O p (n~ mi ^ 1 - 2v W). 

Remark 3.3. Theorem 3.2 implies that the optimal rate of convergence 
of the proposed estimator is n p ^ 2 (' p+1 ^ , achieved by letting pv = (1 — 2v)/2. 
This rate is equal to n 1 / 3 when p = 2 and improves as p (the degree of 
smoothness of the true joint distribution) increases. Nonetheless, the rate 
will never exceed n 1 / 2 . The result of Theorem 3.2 also indicates that the 
proposed method potentially results in an estimate of the targeted joint 
CDF with a faster convergence rate than the conventional NPMLE method 
given in Song (2001). 

4. Computation of the spline-based sieve MLE. For the -B-spline-based 
sieve MLE, the constraint set (3.1) complicates the numerical implementa- 
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tion. We propose to compute the sieve MLE using /-spline basis functions for 
the sake of numerical convenience. The /-spline basis functions are defined 
by Ramsay (1988) as 

0, i > j, 
(4.1) 4(s) = \ J2(u m+l+l - u m )M l + l (s)/(l + 1), j-l + l<i<j, 

m=i 

1, i<j-l + l 

for Uj < s < Uj+i, where M^s are the M-spline basis functions of order I, 
studied by Curry and Schoenberg (1966), and can be calculated recursively 
by 

M}(s) = , Ui<s<u i+1 , 

M l ( s = *[(* - Uj)M]-\s) + (Uj+t - s)M l ~l(s)] 
l[S) {l-l)(u t+l - Ul ) 

By the relationship between the £>-spline basis functions and the M-spline 
basis functions [Schumaker (1981)], it can be easily demonstrated that the 
/-spline basis function defined in (4.1) can be expressed by a sum of the 
/?-spline basis functions 

Pn 

(4-2) I l -\s) = Y,NL(s). 

m=i 

Consequently, the spline-based sieve space can be reconstructed using the 
/-spline basis functions with a different set of constraints: 

{Pn q n 
r n = (F n , F n>1 ,F ni2 ) : F n (s, t) = ]T £ thjI^' 1 ' 1 (s)lf' l ~\t) , 
i=l j=l 

Pn ( qn 

i=i ij=i j 



qn ( Pn 



with 2= (r?i,i,-.-,7?p n , 9 „), uj_=(uJi,...,uj Pn ) and 7[= (7ri,...,7r gn ) subject to 
the following conditions in (4.3), 

Vi,j > for i = 1, . . . ,p n ,j = l,...,q n , 
(4.3) Wi>0, £ = l,...,p n) 
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vi"j>0, j = l,...,q n , 

Pn Qn Pn Qn 

i=i j=i i=i j=i 

Then the spline-based sieve log likelihood can be also expressed in /-spline, 
and the spline-based sieve MLE can be obtained by maximizing the log 
likelihood in /-spline over Q n . 

Remark 4.1. Class n is actually equivalent to Q n , and hence the 
/-spline-based sieve MLE is the same as the .B-spline-based sieve MLE. 
It is advocated in numerical implementation due to the simplicity of the 
constraints in class O n . 

Given p n and q n , the proposed sieve estimation problem described above 
is actually a restricted parametric maximum likelihood estimation problem 
with respect to the coefficients associated with the /-spline and the tensor I- 
spline basis functions. Jamshidian (2004) generalized the gradient projection 
algorithm originally proposed by Rosen (1960) using a weighted L2-norm 
||x|| = x'Wx with a positive definite matrix W for the restricted maximum 
likelihood estimation problems. Because the constraint set (4.3) is made by 
linear inequalities, the maximization of (2.2) in the /-spline form over n can 
be efficiently implemented by the generalized gradient projection algorithm 
[Jamshidian (2004)] and is described as follows. 

First we rewrite (4.3) as XO < y, where X = (xi,x 2 , ■ ■ ■ ,x Pn . qn+Pn+qn , 

X Pn-q n +Pn+qn + l) T With *\ = ( ~ 1 , 0, . . . , 0) T , X 2 = (0 , - 1 , 0, . . . , 0) T , 

x Pn-q n + P n+qn = (0, . . . , , - 1 ) T , x Pn .q n+Pn+qn +i = (1,...,1) T ; 6 = (r/,w,vr) = 
(9i,02, . • • , Q Pn -q n +p n +q„)', and y = (0, . . . , 0, 1) T . If some /-spline coefficients 
equal or all coefficients sum up to 1, then we say their corresponding 
constraints are active and let X6 = y represent all the active constraints 
and a vector A of integers to index the active constraints. For example, if 
A = (2, l,p n ■ q n + p n + q n + 1), then the second, first and last constraints 
become active, and X = (x2,x 1 ,x Pn . qn+Pn+qn+1 ) T and y= (0,0, 1) T . 

Let 1(9) and H(0) be the gradient and Hessian matrix of the log likelihood 
given by (2.2) in the /-spline form, respectively. Note that H(6) may not be 
negative definite for every 6. We use W = —H(9) + 51, where / is identity 
matrix, and S > is chosen as any value that guarantees W being positive 
definite. With that introduced, the generalized gradient projection algorithm 
is implemented as follows. 

Step 1 (Computing the feasible search direction). Compute 
d= (di,d 2 ,. ■ ■ , d Pn . qn+Pn+qn ) 

= {i- w~ 1 x T (xw~ 1 x T y 1 x}w~ 1 i(9). 
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Step 2 (Forcing the updated 9 to fulfill the constraints). Compute 

Y _ y-^Pn-qn+Pn+qn ^ 



mm< mm 



kdiKoX diY Y$^ n+pn+qn di 

Pn-qn+Pn+q n 

if ]T di> o, 



min < — - > , else. 

I. i:di<0 I di) 

Doing so guarantees that 9i + ^fdi > for i = 1, 2, . . . ,p n ■ q n + p n + q n , 
and £f=r +Pn+9n (0i+7di)<l- 
Step 3 ( Updating the solution by step-halving line search). Find the smallest 
integer k starting from such that 

[„(fl + (i/2) fc 7 d;-)>^;-)- 

Replace 9 by 9 = 9 + mm{(l/2) fe 7, 0.5}d. 
Step 4 (Updating A and X). Modify A by adding indexes of new /-spline 

coefficients when these new coefficients become and adding p n ■ 

Qn+Pn + Qn + 1 when the sum of all /-spline coefficients becomes 1. 

Modify X accordingly. 
Step 5 (Checking the stopping criterion). If > e, for small e, go to Step 

1; otherwise, compute A = (XW~ 1 X T )~ 1 XW~ 1 l(9). 

(i) If the jth component Xj > for all j, set 9 = 9 and stop. 

(ii) If there is at least one j such that Xj < 0, let j* = argminj.^^olAj}, 
then remove j*th component from A and remove the J*th row 
from X, and go to Step 1. 

5. Simulation studies. Copula models are often used in studying bivari- 
ate event time data [Shih and Louis (1995), Wang and Ding (2000), Sun, 
Wang and Sun (2006), Zhang et al. (2010)] 

We consider the bivariate Clayton copula function 

C a (u,v) = (u^ + v^-l) 1/{1 - a \ 

with a > 1. For the Clayton copula, a larger a corresponds to a stronger 
positive association between the two random variables. The association pa- 
rameter a and Kendall's r for the Clayton copula is related by r = 

In the simulation studies, we compare the proposed sieve MLE to the con- 
ventional NPMLE, computed using the algorithm developed by Maathuis 
(2005). As we mentioned previously, this NPMLE is not unique. Only the 
total mass in each selected rectangle is estimated, therefore the estimated 
joint CDF is based on where the mass is placed in each rectangle. We de- 
note U-NPMLE and L-NPMLE as the NPMLE for which the probability 
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mass is placed at the upper right and lower left corners of each rectangle, 
respectively. 

The proposed sieve MLE and both U-NPMLE and L-NPMLE are eval- 
uated with various combinations of Kendall's r (r = 0.25,0.75) and sample 
sizes in = 100,200). Under each of the four settings, the Monte-Carlo sim- 
ulation with 500 repetitions is conducted, and the cubic (I = 4) /-spline 
basis functions are used in the proposed sieve estimation method. The event 
times (Ti, T 2 ), monitoring times (C±, C 2 ) and the knots selection of the 
cubic /-spline basis functions are illustrated as follows: 

(i) (Event times). (T\,T 2 ) are generated from the Clayton copula with 
the two marginal distributions being exponential with the rate parame- 
ter 0.5. Under this setting, Pr(T; > 5) < 0.1 for i = 1, 2 and [L x , J7i] x [L 2 , U 2 ] 
is chosen to be [0,5] x [0,5]. 

(ii) (Censoring times). Both C\ and C 2 are generated independently 
from the uniform distribution on [0.0201,4.7698] [Pr(0 < Tj < 0.0201) = 
Pr(4.7698 < T { < 5) = 0.01, for i = 1,2]. The observation region [h,Ui] x 
[h,u 2 ] = [0.0201, 4.7698] x [0.0201, 4.7698] is inside [0, 5] x [0, 5] and the CDFs 
are bounded away from and 1 inside the observation region. 

(iii) (Knots selection). As in other spline-based estimations [Lu, Zhang 
and Huang (2007, 2009), Zhang, Hua and Huang (2010) and Wu and Gao 
(2011)], the number of interior knots m n is chosen as [n 1 / 3 ] — 1, where [n 1 / 3 ] 
is the integer part of n 1 / 3 . For moderate sample sizes, say n = 100,200, our 
experiments show that m n = [n 

1/3] _ 1 

is a reasonable choice for the number 
of interior knots. Therefore, we choose 4 and 5 as the numbers of interior 
knots for sample sizes 100 and 200, respectively. The number of spline basis 
functions is determined by p n = q n = m n + 4 in our computation. Two end 
knots of all knot sequences are chosen to be and 5. For each sample of 
bivariate observation times (C\,C 2 ), the interior knots for {/^ 1 ^' 3 }|" 1 and 

(2") 3 

1 are allocated at the kj (m n + 1) quantiles (k — 1, ... , m n ) of the 
samples of C\ and C 2 , respectively. 

Table 1 displays the estimation biases (Bias) and the square roots of mean 
square errors (MSE 1//2 ) from the Monte-Carlo simulation of 500 repetitions 
for the proposed sieve MLE (Sieve) and both U-NPMLE ( U-Non) and L- 
NPMLE (L-Non) of the bivariate CDF at 4 selected pairs of time points 
(si,s 2 ) near the corners of the estimation region with different sample sizes 
and Kendall's r values. The estimation results at those selected points are 
comparable among the three estimators. Table 2 presents the overall esti- 
mation bias and mean square error for the three estimators by calculating 
the average of absolute values of estimation bias and the average of square 
roots of mean square error taking from 2209 pairs of (^1,^2) with both s± 
and s 2 ranging uniformly from 0.1 to 4.7. It appears that the sieve MLE 
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Table 1 

Comparison of the estimation bias and square root of mean square error among the sieve 
MLE, U-NPMLE and L-NPMLE at four selected points 



T 2 



0.1 4.6 



Ti 


Sieve 


U-Non 


L-Non 


Sieve 




U-Non 


L-Non 


Sample size n = 


100, Kendall's 


r = 0.25 
















0.1 Bias 


-5.00e-3 - 


-1.78e-2 


-1.91e- 


2 


2.69e- 


2 


-2.22e-3 


-3.93e- 


2 


MSE 1/2 


2.75e-2 


2.24e-2 


1.91&- 


2 


7.32e- 


2 


6.68e-2 


5.53e- 


2 


4.6 Bias 


2.33e-2 - 


-2.69e-2 


-4.19e- 


2 


4.04e- 


2 


1.32e-l 


1.09e- 


1 


MSE 1/2 


7.18e-2 


6.68e~2 


5.17&- 


2 


8.24e- 


2 


1.49e-l 


1.35e- 


1 


Sample size n = 


200, Kendall's 


T = 0.25 
















0.1 Bias 


-4.39e-3 - 


-1.85e-2 


-1.91e- 


2 


2.26e- 


2 


-2.87e-2 


-3.89e- 


2 


MSE 1/2 


2.42e-2 


1.98e-2 


1.91e- 


2 


6.05e- 


2 


5.52e-2 


5.04e- 


2 


4.6 Bias 


1.65e-2 - 


-3.29e-2 


-4.03e- 


2 


2.15&- 


2 


1.10e-l 


9.65e- 


2 


MSE 1/2 


5.31e-2 


5.50e-2 


5.30e- 


2 


6.10e- 


2 


1.29e-l 


1.21e- 


1 


Sample size n — 


100, Kendall's 


t = 0.75 
















0.1 Bias 


-1.81e-2 - 


-3.62e™2 


-4.33e- 


2 


2.91e- 


2 


-1.95e-2 


-4.11e- 


2 


MSE 1/2 


4.63e-2 


5.36e-2 


4.34e- 


2 


7.88e- 


2 


8.19e-2 


5.62e- 


2 


4.6 Bias 


3.08e-2 - 


-1.90e-2 


-4.04e- 


2 


1.98e- 


2 


1.03e-l 


8.09e- 


2 


MSE 1/2 


8.16e-2 


8.45e-2 


5.83e- 


2 


6.47e- 


2 


1.22e-l 


1.08e- 


1 


Sample size n = 


200, Kendall's 


r = 0.75 
















0.1 Bias 


-2.03e-2 - 


-4.00e-2 


-4.31e- 


2 


2.01&- 


2 


-2.48e-2 


-3.81e- 


2 


MSE 1/2 


3.86e-2 


4.52e-2 


4.37e- 


2 


5.87e- 


2 


5.90e-2 


5.27e- 


2 


4.6 Bias 


2.09e-2 - 


-2.48e-2 


-3.93e- 


2 


1.08e- 


2 


8.37e-2 


7.20e- 


2 


MSE 1/2 


6.00e-2 


6.27e-2 


5.35e- 


2 


5.24e- 


2 


1.04e-l 


9.40e- 


2 



Table 2 

Comparison of the overall estimation biases and the overall mean square errors among 
sieve MLE, U-NPMLE and L-NPMLE 



Sample size 



T 






100 






200 




Sieve 


U-Non 


L-Non 


Sieve 


U-Non 


L-Non 


0.25 


|Bias| 


7.56e-3 


3.24e-2 


4.24e-2 


6.70e-3 


2.62e-2 


3.11e-2 




MSE 1 / 2 


7.93e-2 


1.25e-l 


1.26e-l 


6.13e-2 


1.03e-l 


1.03e-l 


0.75 


|Bias| 


l.lle-2 


1.50e-2 


2.49e-2 


7.29e-3 


1.33e-2 


1.88e-2 




MSE 1/2 


7.40e-2 


1.10e-l 


l.lle-1 


5.77e-2 


8.45e-2 


8.53e-2 
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Fig. 1. Comparison of the estimation bias between the sieve MLE (left) and the 
U-NPMLE (right) for the joint CDF when sample size n — 200, Kendall's r — 0.25. 




Fig. 2. Comparison of the estimation bias between the sieve MLE (left) and the 
U-NPMLE (right) for the joint CDF when sample size n — 200, Kendall's r = 0.75. 




Fig. 3. Comparison of the square root of mean square error between the sieve MLE (left) 
and the U-NPMLE (right) for the joint CDF when sample size n = 200, Kendall's r = 0.25. 

outperforms its counterparts with a smaller overall bias and a smaller over- 
all mean square error. The mean square error of the proposed sieve MLE 
noticeably decreases as sample size increases from 100 to 200. 

For sample size n = 200, the estimation biases and the square roots of 
mean square error of the sieve MLE and U-NPMLE for the joint CDF from 
the same Monte-Carlo simulation are graphed in Figure 1 through Figure 4 
for Kendall's r = 0.25 and 0.75. These figures clearly indicate that the bias 
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Fig. 4. Comparison of the square root of mean square error between the sieve MLE (left) 
and the U-NPMLE (right) for the joint CDF when sample size n = 200, Kendall's t = 0.75. 




12 3* 5 1 2 3 4 S 



Fig. 5. Comparison of the estimation bias between the sieve MLE and the U-NPMLE 
for estimating the marginal CDF ofT\ when sample size n = 200 (left: Kendall's r = 0.25; 
right: Kendall's r = 0.75 ). 

and the MSE of the sieve MLE are noticeably smaller than that of U-NPMLE 
inside the closed region [0.1,4.7] x [0.1,4.7]. It is also seen that the bias of 
the sieve MLE near the origin increases as Kendall's r increases. As a by- 
product of the estimation methods, the average estimate of the marginal 
CDF of T\ from the same Monte-Carlo simulation for both the proposed 
sieve MLE [Sieve) and U-NPMLE ( U-Non) are also computed and plotted 
in Figure 5 along with the true marginal CDF (True), F±. Figure 5 clearly 
indicates that the bias of the proposed sieve MLE for the marginal CDF is 
markedly smaller than that of the U-NPMLE, particularly near the two end 
points of interval [0.1,4.7]. 

6. Final remarks. The estimation of the joint CDF with bivariate event 
time data is a challenging problem in survival analysis. Development of 
sophisticated methods for this type of problems is much needed for ap- 
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plications. In this paper, we develop a tensor spline-based sieve maximum 
likelihood estimation method for estimating the joint CDF with bivariate 
current status data. This sieve estimation approach reduces the dimension of 
unknown parameter space and estimates both the joint and marginal CDFs 
simultaneously. As a result, the proposed method enjoys two advantages 
in studying bivariate event time data: (i) it provides a unique estimate for 
the joint CDF, and the numerical implementation is less demanding due 
to dimension reduction; (ii) the estimation procedure automatically takes 
into account the possible correlation between the two event times by satis- 
fying the constraints, which intuitively results in more efficient estimation 
for the marginal CDFs compared to the existing methods for estimating the 
marginal CDFs using only the univariate current status data. 

Under mild regularity conditions, we also show that the proposed spline- 
based sieve estimator is consistent and could converge to the true joint CDF 
at a rate faster than re 1 / 3 if the target CDF is smooth enough. Both theo- 
retical and numerical results provide evidence that the proposed sieve MLE 
outperforms the conventional NPMLE studied in the literature. The supe- 
rior performance of the proposed method mainly rests on the smoothness 
of the true bivariate distribution function. In many applications of bivari- 
ate survival analysis, this assumption of smoothness is reasonable and shall 
motivate the use of the proposed method. 

Though the development of the proposed method is illustrated with bi- 
variate current status data as it algebraically simplifies the theoretical jus- 
tification, the proposed method can be readily extended to bivariate inter- 
val censored data [Song (2001) and Maathuis (2005)] as well as bivariate 
right censored data [Dabrowska (1988) and Kooperberg (1998)] with paral- 
lel theoretical and numerical justifications. It is potentially applicable in any 
nonparametric estimation problem of multivariate distribution function. 

While the consistency and rate of convergence are fully studied for the 
proposed estimator, the study of its asymptotic distribution is not accom- 
plished. With the knowledge of asymptotic distribution of the conventional 
NPMLE for current status data studied in Groeneboom and Wellner (1992) 
and Song (2001), it is for sure that the asymptotic distribution of the pro- 
posed estimator will not be Gaussian. Discovering the limiting distribution 
for the proposed estimator remains an interesting yet a very challenging 
problem for future investigation. 

7. Proofs of the theorems. For the rest of this paper, we denote K as 
a universal positive constant that may be different from place to place and 
Pn/ = \ Y^i=i f(Xi), the empirical process indexed by f{X). 

Proof of Theorem 3.1. We show f n is a consistent estimator by 
verifying the three conditions of Theorem 5.7 in van der Vaart (1998). 
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For (s,t) £ [h,ui] x [Z 2 ,it 2 ], we define 0, by 
n = { T (s,t) = (F(s,t),F 1 (s),F 2 (t)): 

t satisfies the following conditions (a) and (b)}: 

(a) F(s,t) is nondecreasing in both s and t, F\(s) — F(s,t) is nonde- 
creasing in s but nonincreasing in t, F 2 (t) — F( s ,t) is nondecreasing in t but 
nonincreasing in s, and 1 — -Fi(s) — F 2 (t) + F(s, t) is nonincreasing in both s 
and t, 

(b) F(s,t)>h, F 1 (s)-F(s,t)>b 2 , F 2 {t)-F{s,t)>b 3 and 1-F^s)- 
F 2 {t) + F{s,t) > 64, for 61 > 0, b 2 > 0, 63 > and 64 > 0. 

Since (C2) and (C6) hold, Lemma 0.1 in the supplemental article [Wu 
and Zhang (2012)] implies that there exist b\ > 0, b 2 > 0, 63 > and 64 > 
small enough to guarantee that To € Q and Q' n G Q. We suppose b\, b 2 , 63 
and 64, in condition (b) above, are chosen small enough such that fi contains 
both To and £l' n . 

Denote C = {/(t) :t £ Q} the class of functions induced by the log likeli- 
hood with a single observation x = (s, t, 81, 6 2 ), where 

Z( T ) = logics, t) + <5i(l - <5 2 ) log[Fi(a) - F(s, t)] 

+ (l-6 1 )8 2 log[F 2 (t)-F(s,t)} 

+ (1 - 6i)(l - <y 2 )log[l - Fx(s) - F 2 (i) + F(s,i)], 

with <5i = l [Tl < s] , 5 2 = l[T 2 <t] • We denote M(r) = P/(r) and M n (r) = P n (/(r)). 
First, we verify sup TgQ |M n (r) - M(r)| 0. 
It suffices to show that £ is a P-Glivenko-Cantelli, since 

sup|M n (r)-M(r)|= sup |(P„ - P)l(r)\ ^ p 0. 

Let A 1 = {^gM :r = (F,F 1 ,F 2 )GO}, and & = {l [h , sM i 2 , t] , h < « < 

ui,/ 2 < i < u 2 }. By conditions (a) and (b), we know < < 1 and 

10 foglt'^ * s non i ncreasm g in both s and t. Therefore A\ C sconv(^i), the clo- 
sure of the symmetric convex hull of Q\ [van der Vaart and Wellner (1996)]. 
Hence Theorem 2.6.7 in van der Vaart and Wellner (1996) implies that 

(7.1) N(e,Q u L 2 {Q Cx ^))<k[- 

for any probability measure Qd,C 2 °f (Ci,C 2 ). By the facts that V{Q\) = 3 
and the envelop function of Q\ is 1. (7.1) is followed by 

A\ 4/3 

logN(£,scom(g 1 ),L 2 {Qc 1 ,c 2 ))<Kl-\ , 
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using the result of Theorem 2.6.9 in van der Vaart and Wellner (1996). Hence 

i \ 4/3 

(7.2) 
Let 



logN(e,A l ,L 2 (Q Cu c 2 ))<K[-\ . 



A[ = {5^2 log F(s, t):r= (F, F 1 ,F 2 ) G n}. 

Suppose the centers of e-balls of A\ are fi,i = 1,2, ... , [lf(~) 4 / 3 ], and then 
for any joint probability measure Q of (T±, T 2 , C\, C 2 ), 

\\5AlogF-5AloghfiW 2 



L 2 (Q) 



Q 

E 



5i5 2 \og 6i 



logF 
log&i 



1 [t 1 <c 1 ,t 2 <c 2 ] 1 °S h 



log F(C 1 ,C 2 ) 



El E 



1 [Ti<Ci,r 2 <c* 2 ] lo sh 
F (C 1 ,C 2 )log6i 



log bi 
\ogF(d,C 2 ) 



fi(C\,C 2j 



logfri 

\ogF{C x ,C 2 ) 
log^i 



fi(C\,C 2j 



C\,C 2 



logfel : 7 fi(Cl,C 2i 

log&i 

2 



fi(Ci,C 2/ 

2 



log 


;F f 


log 





= E Ci,C 2 

<E Cl ,c 2 
= (logh) 

L 2 {Q Cl ,c 2 ) 

Let b\ = — log b\ then <5i<52 log b\fi, i = 1, 2, . . . , [K(i) 4//3 ] , are the centers 

of e6i-balls of A[. Hence by (7.2) we have log N {eh, A^L^Q)) < K{±) 4/3 , 
and it follows that 

J sup yj log N(ebi , A\ , L 2 (Q) ) de < J ^(j) de<oo. 

It is obvious that the envelop function of A\ is b\, therefore A± is a P- 
Donsker by Theorem 2.5.2 in van der Vaart and Wellner (1996). 
Let 

A' 2 = {^(1 - ,5 2 )log(Fi( S ) - F(s,t)):T = (F,F U F 2 ) G Q}, 
A' 3 = {(1 - 6J82 log(F 2 (t) - t)):r = (F,F\,F 2 ) G 0} 



and 



4 = {(1 - 6i)(l - 8 2 ) log(l - Fx( S ) - F 2 (i) - F( S ,t)) : 

r = (F,F 1 ,F 2 )en}. 



20 Y. WU AND Y. ZHANG 

Following the same arguments for showing A[ being a P-Donsker, it can 
be shown that A 2 ,A' 3 and A'^ are all P-Donsker classes. So C is P-Donsker 
as well. Since P-Donsker is also P-Glivenko-Cantelli, it then follows that 
su Pi(r)e£ |(P n -P)/(r)|^ p 0. 

Second, we verify M(r ) - M(r) > Pd 2 (r , r), for any re!!. 

Note that 

M(r ) - M(r) 

= P{1(t )-1(t)} 



+ (l-5i)<5 2 log 



F *' ° F x -F 

Fo,2 — Fq 



F 2 -F 



+ (l-ft)(l-fc)log x_ Fl _ F2 + F ) 
= P Cl ,c 2 {Po log ^ + (P ,i - P ) log 

+ (P ,2-Po)iog i? °; 2 ~f 

r 2 — r 

, r-i r? p i 77! \ i 1 — Po,i — Po,2 + -^b 

+ (1 - P ,l - P ,2 + P ) log — = ' 

l — t\ — r 2 + r 

and it follows that 
M(r ) - M(r) = P CliC2 (pmf + (Pi - P)m ' 



(7.3) + (P 2 - F)m 



F ) v y V P - P 

-Fb,2 — Po 



P 2 -P 



+(1 - Fl - F2+F)m ( '-_^-_^;;o )}, 



where m(cc) = xlog(rr) — x + 1 > (x — l) 2 /4 for < x < 5. 
Since P has positive upper bound, 

P Cl ,cJFm(^) | > P ClA |p^ - M > PPci,c 2 (^b — -F) 5 
(7.4) 

=^ll^o- J p|iL ( p 0l , 02 )- 

Similarly, we can easily show that 
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(7.5) 

>K\\(Fo,i-F 1 )-(F -F)\\l ii Pb 1 * i )> 
Fo,2 — Fo 

(7.6) 
and 



P Cl ,cMF 2 -F)m 

>K\\(F , 2 -F 2 )-(F -F)\\l 2{PciC2) 



(7.7) 

> Al(l - F 0jl - F , 2 + F ) - (1 - F 1 - F 2 + F)^^. 

So combining (7.4), (7.5), (7.6) and (7.7) results in 
M(ro)-M(r)>K(||F -F||| 2{Pci|C2) 

+ m,i-F 1 )-(F -F)\\i 2(PciC2) 

+ \\(F , 2 -F 2 )-(F -F)\\l 2{PcitC2) ). 

Let h = \\Fo-F\\l 2[Poi ^ h = \\F , 1 -F 1 \\l iPci) and / 3 = P^-^lli,,^). 
If fi is the largest among /i,/ 2 , / 3 , then 

(7.8) M(r ) - M(r) > Kf x > (K/3)(h + f 2 + / 3 ). 
If f 2 is the largest, then 

(7.9) M(r ) - M(r) > K[h + (/ 2 - f x )] > Kf 2 > (lf/3)(/i + / 2 + /a). 
If /3 is the largest, then 

(7.10) M(r ) - M(r) > [ft + (/ 3 - /i)] > K/ 3 > (lif/3)(/i + / 2 + / 3 ). 
Therefore, by (7.8), (7.9) and (7.10), it follows that 

M(r ) -M(r) >W 2 (r ,r). 
Finally, we verify M re (f n ) - M n (r ) > -Op(l). 

Since (C2), (C3) and (C6) hold, Lemma 0.3 in the supplemental article 
[Wu and Zhang (2012)] implies that there exists r n = (F n ,F nt i,F n>2 ) in Q' n 
such that for r = (F , F 0)1 , F 0)2 ), ||F„ - F |U < ^(n~ p,; ), - >o,i||oo < 

K(n-*" ; ) and ||F n>2 - F , 2 ||oo < K( n -P v ). Since f„ maximizes M n (r) in fi^, 
M n (f n ) - M n (r n ) > 0. Hence, 

M n (f n ) - M n (r ) = M n (f n ) - M n (r n ) + M n (r n ) - M n (r ) 

(7.11) > M n (r n ) - M n (r ) = F„(/(r n )) - P„(Z(r )) 

= (P n - P){l(r n ) - /(r )} + P{l(r n ) - /(r )}. 
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Define 

Cn = {l(r n ):r n = (F n , P n ,i, P n , 2 ) E Q' n , \\F n - PqIU < K(n~ pv ), 
\\F n ,i ~ Fo.illoo < K(n-W), ||F n , 2 - Fq^IU < K{n^°)}. 
Since (a + 6 + c + d) 2 < 4(a 2 + b 2 + c 2 + d 2 ), then for any l(r n ) G C n , we 



have 



P{/(r n )-/(r )} 2 

<4P^ 1 5 2 log^ 



(7.12) 



+ 4P^ 1 (l-5 2 )log^i-^) 2 + 4p( 



+ 4P( (1- (5i)(l -<5 2 )loj 



1-p 



n,l 



(i-(5i)<y 2 ioj 



Fn,2 + Pi \ ' 



P 



n,2 



P, 



P 



0,2 



Pn 



<4P CljC2 log-f +4P CliC2 lo. 



1 — Po,l — ^0,2 + Fq J 
2 



P 



P 



0,1 



Ft 



P 



n,2 



Pn 



1 — F«,l — Pn,2 + Pn V 



+ 4P Cl , C2 log f + 4P Cl , C2 log ' " 

\ ^0,2 — -^0 / \ 1 — -^0,1 — -^0,2 + -^0 / 

The facts that \\F n — Po||oo < if and that Po has a positive lower 

bound result in 1/2 < ^ < 2 for large n. It can be easily shown that if 

1/2 < x < 2, |log(z)| < K\x - 1|. Hence |logf?-| < K\j% - 1|, and it follows 
that 

(7.13) P CuGa 



log 



Fn 
Pn 



<KP Cl ,C 2 



Pn 



<F:P Cl ,C 2 |F n -P | 



0. 



Similar arguments yield 



Pn,l — 


F n 


Fq,i — 


Fo 


F n ,2 — 


Pn 


Fo,2 — 


Fo 


P Cl ,C 2 


log 



(7.14) P CuC2 log ^™ ^ < PP Cl ,C 2 |(F n ,l " Pn) " (P ,l " F )|* ^ 0, 

(7.15) P ClA log 
and 

^ ^ ^1^2 ° 1 — p 01 — p 2 -f p 

Combining (7.12)-(7.16) results in P{l{T n ) - 1(t )} 2 -> 0, as n — 7- oo. Hence 
pp{Z(r„) - /(r )} = {varp[f(T n ) - 1(t )}} 1/2 

(7.17) 

<{P[Z(T n )-Z(To)] 2 } 1/2 ^^0. 



< KP CuC2 \(F n ,2 - Pn) " (P ,2 " Po) 



1 — Pn,l — Pi,2 + F n 
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Since C is shown a P-Donsker in the first part of the proof, Corol- 
lary 2.3.12 of van der Vaart and Wellner (1996) yields that 

(7.18) (P„ - P){l(r n ) - l( To )} = o p (n^ 2 ), 

by the fact that both l{j n ) and 1{tq) are in C and (7.17). 
In addition, 

\P{l(r n ) - 1{t )}\ < P\l{r n ) - 1(t )\ < K{P[l(r n ) - l(r )} 2 } 1/2 -+ n ^oc 0. 
Therefore P(Z(r n ) — 1(tq)) > — o(l) as n — > oo. Hence, 

M n (f n ) - M n (r ) > o p {n- 1 ' 2 ) - o(l) > -o p (l). 
This completes the proof of d(f n , To) — > in probability. □ 

Proof of Theorem 3.2. We derive the rate of convergence by verify- 
ing the conditions of Theorem 3.4.1 of van der Vaart and Wellner (1996). To 
apply the theorem to this problem, we denote M n (r) = M(r) = PI(t) and 
dn(n,T 2 ) = <2(ti,t 2 ). The maximizer of M(r) is r = (F , F ,i 5 -^0,2)- 

(i) Let T n G W n with r n satisfying d(r n , To) < K(n~ pv ) and 5 n = n~ pv . We 
verify that for large n and any 5 > 6 n , 

sup (M(r) - M(t„)) < 

5/2<d(T,T„)<5,Te«£, 

Since d(r,ro) > <i(r, r n ) — d(ro,T n ) > (5/2 — K(n~ pv ), then for large n, 
d(r, To) > if<5. In the proof of consistency, we have already established that 
M(t) - M(r ) < -Kd 2 (T,T ) < -Kb 2 . And as shown in the proof of con- 
sistency, M(r ) - M(r„) < Kd 2 {r^ r„) < K(n~ 2vv ). Therefore, for large n, 
M(t)-M(t„) =M(r)-M(To)+M(r )-M(r n ) < -if <5 2 +iiC(n- 2 ? w ) = -K8 2 . 

(ii) We shall find a function ?/>(•) such that 

E{ sup G n (r-T„))<K^ 

L 5/2<d(r,r n )<<5,re^' n J V n 

and 5 — > tp(5)/5 a is decreasing on 5, for some a < 2, and for r n < it 
satisfies 



r n if)(l/r n ) < K\fn for every n. 



Let 



£n,5 = {/(r) - Z(r„) : r G and 5/2 < d(r, r n ) < 5}. 

First, we evaluate the bracketing number of C n ^. 

Let F n = {F:t = (F,F 1 ,F 2 ) G fi;,<S/2 < d(r,r n ) < 5}, J" n ,i = {Pi :r = 
(F,F 1 ,F 2 ) G fiJ,,$/2 < d(r,r„) < <5} and F n , 2 = {F 2 :r = (F,F 1 ,F 2 ) G ^, 
,5/2<d(r,r n )<5}. 
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Denote r n = (F n ,F nt i,F n>2 ). Lemma 0.5 in the supplemental article [Wu 
and Zhang (2012)] implies that there exist e-brackets [D^,DY],i = 1,2,..., 
[(d/e) KpnQn ] to cover T n — F n . Moreover, Lemma 0.6 in the supplemental ar- 
ticle [Wu and Zhang (2012)] implies there exist e-brackets [Dj L ,Dj' U ],j = 

1,2,..., l(5/e) Kp ™], to cover T n> \ — F n> i, and there exist e-brackets [D 
1,2, ... , [(6/e) K *>], to cover T n>2 - F n>2 . 



(2),i 
k ' 



Denote if = D[ + F n , Ff = D? + F n , F^' L 



+ F n ,i, F t 



D™' L + F n , 2 and F^ u 



= D^ L +F nl ,F^ U 

3 7 3 

F>? ),C/ + F re , 2 . Let 



<Wogif + <5 1 (l-<5 2 )log(Ff ) ' C/ 



Ft) 



+ (1 - S 1 )S 2 log(F^ U -Ft) 



+ (l-5 1 )(l- ( J 2 )log(l-J5 



and 



5 l 5 2 logFt + 5^1-52) log(F 



+ (l-5 1 )5 2 log(i?. 



(2),i 



if) 



+ (l-5 1 )(l-<5 2 )log(l-F{ 1) ' l/ 



F 



Then for any Z(r) E {£ n ,<5 + K r n)}j there exist i,j,k, for i = 1,2,..., 
[(5/e) Kp ^], j = l,2,...,[(S/e) K P n } and k = 1, 2, . . . , [(s/e) Kqn ], such that 
lijk < K r ) ^ fc an< ^ ^he number of brackets [Zj" 3 - fe , Z^ 7 - J's is bounded by 
(5/e) K P" qn • {5/e) K P" ■ (5/e) Kqn . 
Note that 



< 



l 'i,j,k\\ 00 

F u 



+ 



F (i),t/ 
log - 3 



F„ L 



n 



fF 



,(2),C/ 



log- 



F L 



F, 



(2),i 



F/ 7 



+ 



i-f; 

log- 



(!).-& 



F 



(2),L 



+ F [/ 



1 - F 



F 



(2),i7 



+ Ft 



Since for any r S fi^, F has a positive lower bound, then for a small e, 
can be made to have a positive lower bound as well. Combining with 

the fact that F i (s,t) is close to F^(s,t) guarantees that < — 1 < 1 
for i 



1,2, ... , [(6/e) K *"**]. Note that by logx < (x - 1) for < (x - 1) < 1, 



pU pU 

therefore loe -A- < -A- 
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Hence, 

fF 



log 



FF 



< 



Ff 



< 



{F? - Ft 



FF 



<K\\F 



u 



F 



< Ke. 



Similarly, by the definition of $l' n , we can easily show that 



F 

log 



fF 



F 



<Ke, 



log- 



F, 



F 



F 



(2),i 



FF 



< Ke 



and 



log- 



l-F, 



F (2),L pU 



1 



<ife. 



Hence, the fact that L2-norm is bounded by Loo-norm results in 
(7.19) N {] {e,Cn,5,L 2 (P)} < N {] {s,C n , s , \\ • |U} < (5/e) Kp ^\ 

Next, we show that P{l{r) — /(r n )} 2 < Kb 2 for any Z(r) — Z(r n ) £ £„, j( 5. 
Since for any r = (F,F 1 ,F 2 ) with d(r,r n ) < 5, i ? n||L 2 (p Cl >Ca ) <d( r ; T n) < 
<5. Then with (CI), (C3) and (C5), Lemma 0.7 in the supplemental ar- 
ticle [Wu and Zhang (2012)] implies that for a small 5 > and a suf- 
ficiently large n, F and F n are both very close to Fq at every point in 
[Zi,Ui] x [Z2, M2] • Therefore, F and F n are very close to each other at every 
point in [Zi,ui] x [Z2, 1^2] • Then the fact that F n has a positive lower bound 



results in 1/2 < < 2. Hence | log 4-\ < K 



P 



, F 

1 n 



<KPch,C a 



F„l 
F 

K ~ 1 



l-F™ 



1|, and it follows that 



<KP CuC2 \F-F n \ 2 <K6 2 



Again by the definition of £l' n , we can similarly show that, given a small 
5 > 0, when n is large enough, the following inequalities are true: 



Pi 



C\,C 2 



log 



Fi-F 



F<nl F n 



<K5 2 



p c u c 2 



log 



F 2 -F 



F 



n,2 



F n 



<K5 2 



and 



log 



1 - Fi - F 2 + F 



1 — F n i — F n o + F n 



<K5 2 



Hence for any Z(r) — l(r n ) G £ n ,<5> h is true that P{1(t) — l(T n )} 2 < K5 2 . 
It is obvious that £ Hi $ is uniformly bounded by the structure of the log 
likelihood. Lemma 3.4.2 of van der Vaart and Wellner (1996) indicates that 



E P \\G n \\ CnS <KJn{5,Cn,8,L 2 (P)} 



1 + 



J {] {8,C n:S ,L 2 (P)} 
b 2 ^fn 
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where 



J u {5,Cn,s,L 2 (P)}= [ Jl + log N u {e, £ n ,s, L 2 (P)} de < K( Pn q n ) l l 2 5, 



Jo 




Therefore, if pv < (1 - 2v)/2, n 2pv ilj(l/n pv ) < In 1 ! 2 . Moreover, n l ~ 2v x ^(1/ 
n (l-2«)/2j = 2n V2_ This i m pli es if rn = n win{pv,(l-2v)/2} ^ then ^ < ^-l and 

r^(l/r„) ^ifn 1 / 2 . 

It is obvious that M(f n ) — M(r n ) > and d(f n , r n ) < d(f n , To) + d(ro, r n ) — > 
in probability. Therefore, it follows by Theorem 3.4.1 in van der Vaart and 
Wellner (1996) that r n d(f n ,T n ) = O p (l). Hence, by d(r n ,r ) < K(n~P v ) 
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Technical lemmas (DOI: 10. 1214/12- AOS1016SUPP; .pdf). This supple- 
mental material contains some technical lemmas including their proofs that 
are imperative for the proofs of Theorems 3.1 and 3.2. 
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